Abstractive Text Summarization is the task of generating a short and concise summary that captures the salient ideas of the source text. The generated summaries potentially contain new phrases and sentences that may not appear in the source text.
Contains reviews of amazon products. Each review contains a header (summary) and a review body ( document). We used them to fine-tune our model for summarization.
training | 20,000 |
Validation | 5,000 |
Test | 5,000 |
Contains articles from multiple news sources coupled with human produced extreme (very short) summary
training | 204,045 |
Validation | 11,332 |
Test | 11,332 |
Jordan Hill, Brittany Covington and Tesfaye Cooper, all 18, and Tanishia Covington, 24, appeared in a Chicago court on Friday.\nThe four have been charged with hate crimes and aggravated kidnapping and battery, among other things.\nAn online fundraiser for their victim has collected $51,000 (£42,500) so far.\nDenying the four suspects bail, Judge Maria Kuriakos Ciesil asked: "Where was your sense of decency?"\nProsecutors told the court the beating started in a van and continued at a house, where the suspects allegedly forced the 18-year-old white victim, who suffers from schizophrenia and attention deficit disorder, to drink toilet water and kiss the floor.\nPolice allege the van was earlier stolen by Mr Hill, who is also accused of demanding $300 from the victim\'s mother while they held him captive, according to the Chicago Tribune.\nThe court was also told the suspects stuffed a sock into his mouth, taped his mouth shut and bound his hands with a belt.\nIn a video made for Facebook Live which was watched millions of times, the assailants can be heard making derogatory statements against white people and Donald Trump.\nThe victim had been dropped off at a McDonalds to meet Mr Hill - who was one of his friends - on 31 December.\nHe was found by a police officer on Tuesday, 3 January, a day after he was reported missing by his parents.\nProsecutors say the suspects each face two hate crimes counts, one because of the victim\'s race and the other because of his disabilities.
Four people accused of kidnapping and torturing a mentally disabled man in a "racially motivated" attack streamed on Facebook have been denied bail.
State here state of the art models and their accuracies
Various models have been proposed for the text summarization problem such as BERT, BART, PEGASUS, and T5. All those models follow the encoder decoder coupled with an attention mechanism. Those models achieved the state-of-the-art performance on the summarization task through train super powerful language models. For example, BART was trained on a variety of tasks such as text infilling, sentence deletion, and sentence permutation. Afterwards, those models were trained for the downstream tasks by tuning their weights by training them on the dataset of interest. There are many considerations while selecting a model. The main criteria that we have chosen was the smallness of the model and the ease training them. We wanted to be able to gain quick feedback. Taking this into account, we decided to work the T5-small: the t5 version with the smallest number of Encoder-decoder layers. The T5 model was introduced in the paper Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. The same exact model was pretrained to perform many downstream tasks such as translation, question answering, and summarization.
The Main Architecture of T5-small is as follows. It consists of Standard Encoder/decoder-based architecture. RelU Activation Function was mainly used. However, we experimented also with Gated-Glu Function which was previously proposed for the BART model. Lastly, it has a Self Attention Mechanism for both the encoder and the decoder. We also experimented with using cross-attention instead.
Add all the model updates you made here, need as many images as you wish
Here you will detail the details related to training, for example:
Throughout the project, we have built an abstractive summarization deep learning model. We experimented with different models. However, we decided to go with the T5-model. We experimented with different numbers of hyper-parameters such as the number of beams, number of layers, activation function, cross attention mechanism, and lastly stages training. As an output of this project, we are proposing utilizing the gated-glu activation function, cross attention mechanism, and increasing the number of beams utilized in the beam search.
List all references here, the following are only examples